Overview

Dataset statistics

Number of variables14
Number of observations317890
Missing cells48964
Missing cells (%)1.1%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory34.0 MiB
Average record size in memory112.0 B

Variable types

Categorical4
DateTime1
Numeric9

Alerts

VERSIE has constant value "1.0"Constant
DATUM_BESTAND has constant value "2022-11-21"Constant
PEILDATUM has constant value "2022-11-01"Constant
TYPERENDE_DIAGNOSE_CD has a high cardinality: 1895 distinct valuesHigh cardinality
BEHANDELEND_SPECIALISME_CD is highly overall correlated with AANTAL_PAT_PER_SPCHigh correlation
AANTAL_PAT_PER_ZPD is highly overall correlated with AANTAL_SUBTRAJECT_PER_ZPDHigh correlation
AANTAL_SUBTRAJECT_PER_ZPD is highly overall correlated with AANTAL_PAT_PER_ZPDHigh correlation
AANTAL_PAT_PER_DIAG is highly overall correlated with AANTAL_SUBTRAJECT_PER_DIAGHigh correlation
AANTAL_SUBTRAJECT_PER_DIAG is highly overall correlated with AANTAL_PAT_PER_DIAGHigh correlation
AANTAL_PAT_PER_SPC is highly overall correlated with AANTAL_SUBTRAJECT_PER_SPCHigh correlation
AANTAL_SUBTRAJECT_PER_SPC is highly overall correlated with AANTAL_PAT_PER_SPCHigh correlation
GEMIDDELDE_VERKOOPPRIJS has 48964 (15.4%) missing valuesMissing
AANTAL_SUBTRAJECT_PER_ZPD is highly skewed (γ1 = 21.3579137)Skewed

Reproduction

Analysis started2022-11-22 16:04:36.416574
Analysis finished2022-11-22 16:05:04.358308
Duration27.94 seconds
Software versionpandas-profiling vdev
Download configurationconfig.json

Variables

VERSIE
Categorical

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size2.4 MiB
1.0
317890 

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters953670
Distinct characters3
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1.0
2nd row1.0
3rd row1.0
4th row1.0
5th row1.0

Common Values

ValueCountFrequency (%)
1.0 317890
100.0%

Length

2022-11-22T16:05:04.420549image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2022-11-22T16:05:04.554840image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
ValueCountFrequency (%)
1.0 317890
100.0%

Most occurring characters

ValueCountFrequency (%)
1 317890
33.3%
. 317890
33.3%
0 317890
33.3%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 635780
66.7%
Other Punctuation 317890
33.3%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
1 317890
50.0%
0 317890
50.0%
Other Punctuation
ValueCountFrequency (%)
. 317890
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 953670
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
1 317890
33.3%
. 317890
33.3%
0 317890
33.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII 953670
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1 317890
33.3%
. 317890
33.3%
0 317890
33.3%

DATUM_BESTAND
Categorical

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size2.4 MiB
2022-11-21
317890 

Length

Max length10
Median length10
Mean length10
Min length10

Characters and Unicode

Total characters3178900
Distinct characters4
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2022-11-21
2nd row2022-11-21
3rd row2022-11-21
4th row2022-11-21
5th row2022-11-21

Common Values

ValueCountFrequency (%)
2022-11-21 317890
100.0%

Length

2022-11-22T16:05:04.653948image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2022-11-22T16:05:04.774468image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
ValueCountFrequency (%)
2022-11-21 317890
100.0%

Most occurring characters

ValueCountFrequency (%)
2 1271560
40.0%
1 953670
30.0%
- 635780
20.0%
0 317890
 
10.0%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 2543120
80.0%
Dash Punctuation 635780
 
20.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
2 1271560
50.0%
1 953670
37.5%
0 317890
 
12.5%
Dash Punctuation
ValueCountFrequency (%)
- 635780
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 3178900
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
2 1271560
40.0%
1 953670
30.0%
- 635780
20.0%
0 317890
 
10.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 3178900
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
2 1271560
40.0%
1 953670
30.0%
- 635780
20.0%
0 317890
 
10.0%

PEILDATUM
Categorical

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size2.4 MiB
2022-11-01
317890 

Length

Max length10
Median length10
Mean length10
Min length10

Characters and Unicode

Total characters3178900
Distinct characters4
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2022-11-01
2nd row2022-11-01
3rd row2022-11-01
4th row2022-11-01
5th row2022-11-01

Common Values

ValueCountFrequency (%)
2022-11-01 317890
100.0%

Length

2022-11-22T16:05:04.876361image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2022-11-22T16:05:05.007857image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
ValueCountFrequency (%)
2022-11-01 317890
100.0%

Most occurring characters

ValueCountFrequency (%)
2 953670
30.0%
1 953670
30.0%
0 635780
20.0%
- 635780
20.0%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 2543120
80.0%
Dash Punctuation 635780
 
20.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
2 953670
37.5%
1 953670
37.5%
0 635780
25.0%
Dash Punctuation
ValueCountFrequency (%)
- 635780
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 3178900
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
2 953670
30.0%
1 953670
30.0%
0 635780
20.0%
- 635780
20.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 3178900
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
2 953670
30.0%
1 953670
30.0%
0 635780
20.0%
- 635780
20.0%

JAAR
Date

Distinct11
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size2.4 MiB
Minimum2012-01-01 00:00:00
Maximum2022-01-01 00:00:00
2022-11-22T16:05:05.101280image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-11-22T16:05:05.217386image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
Histogram with fixed size bins (bins=11)
Distinct28
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean437.52328
Minimum301
Maximum8418
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size2.4 MiB
2022-11-22T16:05:05.370351image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/

Quantile statistics

Minimum301
5-th percentile302
Q1305
median313
Q3322
95-th percentile335
Maximum8418
Range8117
Interquartile range (IQR)17

Descriptive statistics

Standard deviation986.54181
Coefficient of variation (CV)2.2548327
Kurtosis61.331728
Mean437.52328
Median Absolute Deviation (MAD)8
Skewness7.9525306
Sum1.3908428 × 108
Variance973264.75
MonotonicityNot monotonic
2022-11-22T16:05:05.519909image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
Histogram with fixed size bins (bins=28)
ValueCountFrequency (%)
305 44866
14.1%
313 41243
13.0%
303 36603
11.5%
330 25186
 
7.9%
316 21651
 
6.8%
308 17059
 
5.4%
306 13337
 
4.2%
324 13108
 
4.1%
301 12777
 
4.0%
304 10340
 
3.3%
Other values (18) 81720
25.7%
ValueCountFrequency (%)
301 12777
 
4.0%
302 6969
 
2.2%
303 36603
11.5%
304 10340
 
3.3%
305 44866
14.1%
306 13337
 
4.2%
307 5548
 
1.7%
308 17059
 
5.4%
310 3492
 
1.1%
313 41243
13.0%
ValueCountFrequency (%)
8418 4247
 
1.3%
8416 529
 
0.2%
1900 210
 
0.1%
390 862
 
0.3%
389 3351
 
1.1%
362 4287
 
1.3%
361 2279
 
0.7%
335 3220
 
1.0%
330 25186
7.9%
329 834
 
0.3%
Distinct1895
Distinct (%)0.6%
Missing0
Missing (%)0.0%
Memory size2.4 MiB
101
 
1342
402
 
1301
403
 
1274
301
 
1273
201
 
1197
Other values (1890)
311503 

Length

Max length4
Median length3
Mean length3.3520054
Min length2

Characters and Unicode

Total characters1065569
Distinct characters25
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique22 ?
Unique (%)< 0.1%

Sample

1st row14
2nd row07
3rd row15
4th row09
5th row10

Common Values

ValueCountFrequency (%)
101 1342
 
0.4%
402 1301
 
0.4%
403 1274
 
0.4%
301 1273
 
0.4%
201 1197
 
0.4%
203 1189
 
0.4%
401 1065
 
0.3%
404 1054
 
0.3%
802 1037
 
0.3%
409 1030
 
0.3%
Other values (1885) 306128
96.3%

Length

2022-11-22T16:05:05.683289image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
101 1342
 
0.4%
402 1301
 
0.4%
403 1274
 
0.4%
301 1273
 
0.4%
201 1197
 
0.4%
203 1189
 
0.4%
401 1065
 
0.3%
404 1054
 
0.3%
802 1037
 
0.3%
409 1030
 
0.3%
Other values (1885) 306128
96.3%

Most occurring characters

ValueCountFrequency (%)
1 203860
19.1%
0 195295
18.3%
2 141199
13.3%
3 115462
10.8%
5 82184
7.7%
9 76832
 
7.2%
4 75670
 
7.1%
7 62727
 
5.9%
6 55667
 
5.2%
8 45901
 
4.3%
Other values (15) 10772
 
1.0%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 1054797
99.0%
Uppercase Letter 10772
 
1.0%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
G 2011
18.7%
M 1820
16.9%
B 1293
12.0%
E 910
8.4%
Z 898
8.3%
D 722
 
6.7%
A 702
 
6.5%
F 669
 
6.2%
C 354
 
3.3%
K 348
 
3.2%
Other values (5) 1045
9.7%
Decimal Number
ValueCountFrequency (%)
1 203860
19.3%
0 195295
18.5%
2 141199
13.4%
3 115462
10.9%
5 82184
7.8%
9 76832
 
7.3%
4 75670
 
7.2%
7 62727
 
5.9%
6 55667
 
5.3%
8 45901
 
4.4%

Most occurring scripts

ValueCountFrequency (%)
Common 1054797
99.0%
Latin 10772
 
1.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
G 2011
18.7%
M 1820
16.9%
B 1293
12.0%
E 910
8.4%
Z 898
8.3%
D 722
 
6.7%
A 702
 
6.5%
F 669
 
6.2%
C 354
 
3.3%
K 348
 
3.2%
Other values (5) 1045
9.7%
Common
ValueCountFrequency (%)
1 203860
19.3%
0 195295
18.5%
2 141199
13.4%
3 115462
10.9%
5 82184
7.8%
9 76832
 
7.3%
4 75670
 
7.2%
7 62727
 
5.9%
6 55667
 
5.3%
8 45901
 
4.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1065569
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1 203860
19.1%
0 195295
18.3%
2 141199
13.3%
3 115462
10.8%
5 82184
7.7%
9 76832
 
7.2%
4 75670
 
7.1%
7 62727
 
5.9%
6 55667
 
5.2%
8 45901
 
4.3%
Other values (15) 10772
 
1.0%

ZORGPRODUCT_CD
Real number (ℝ)

Distinct6008
Distinct (%)1.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean4.4149605 × 108
Minimum10501002
Maximum9.9841808 × 108
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size2.4 MiB
2022-11-22T16:05:05.869479image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/

Quantile statistics

Minimum10501002
5-th percentile28999038
Q199799062
median1.4959903 × 108
Q39.90004 × 108
95-th percentile9.9051604 × 108
Maximum9.9841808 × 108
Range9.8791708 × 108
Interquartile range (IQR)8.9020494 × 108

Descriptive statistics

Standard deviation4.2918281 × 108
Coefficient of variation (CV)0.97211019
Kurtosis-1.7407123
Mean4.4149605 × 108
Median Absolute Deviation (MAD)1.1960003 × 108
Skewness0.46412785
Sum1.4034718 × 1014
Variance1.8419788 × 1017
MonotonicityNot monotonic
2022-11-22T16:05:06.203754image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
990004009 2321
 
0.7%
990004007 2283
 
0.7%
990003004 2220
 
0.7%
990004006 1846
 
0.6%
990356076 1688
 
0.5%
990356073 1558
 
0.5%
131999228 1483
 
0.5%
131999164 1469
 
0.5%
990003007 1447
 
0.5%
131999194 1344
 
0.4%
Other values (5998) 300231
94.4%
ValueCountFrequency (%)
10501002 9
< 0.1%
10501003 11
< 0.1%
10501004 11
< 0.1%
10501005 11
< 0.1%
10501007 3
 
< 0.1%
10501008 11
< 0.1%
10501010 11
< 0.1%
10501011 3
 
< 0.1%
11101002 10
< 0.1%
11101003 11
< 0.1%
ValueCountFrequency (%)
998418081 158
< 0.1%
998418080 142
< 0.1%
998418079 38
 
< 0.1%
998418077 8
 
< 0.1%
998418076 8
 
< 0.1%
998418075 6
 
< 0.1%
998418074 214
0.1%
998418073 214
0.1%
998418072 8
 
< 0.1%
998418071 8
 
< 0.1%

AANTAL_PAT_PER_ZPD
Real number (ℝ)

Distinct10020
Distinct (%)3.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean512.43038
Minimum1
Maximum165142
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size2.4 MiB
2022-11-22T16:05:06.369558image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q13
median14
Q3103
95-th percentile1739
Maximum165142
Range165141
Interquartile range (IQR)100

Descriptive statistics

Standard deviation3164.01
Coefficient of variation (CV)6.1745169
Kurtosis407.0903
Mean512.43038
Median Absolute Deviation (MAD)13
Skewness16.710391
Sum1.6289649 × 108
Variance10010959
MonotonicityNot monotonic
2022-11-22T16:05:06.523850image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1 52435
 
16.5%
2 25640
 
8.1%
3 16727
 
5.3%
4 12335
 
3.9%
5 9596
 
3.0%
6 8151
 
2.6%
7 6726
 
2.1%
8 5793
 
1.8%
9 5221
 
1.6%
10 4640
 
1.5%
Other values (10010) 170626
53.7%
ValueCountFrequency (%)
1 52435
16.5%
2 25640
8.1%
3 16727
 
5.3%
4 12335
 
3.9%
5 9596
 
3.0%
6 8151
 
2.6%
7 6726
 
2.1%
8 5793
 
1.8%
9 5221
 
1.6%
10 4640
 
1.5%
ValueCountFrequency (%)
165142 1
< 0.1%
155884 1
< 0.1%
155025 1
< 0.1%
154269 1
< 0.1%
154184 1
< 0.1%
144724 1
< 0.1%
118395 1
< 0.1%
115938 1
< 0.1%
110520 1
< 0.1%
109675 1
< 0.1%

AANTAL_SUBTRAJECT_PER_ZPD
Real number (ℝ)

HIGH CORRELATION
SKEWED

Distinct10729
Distinct (%)3.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean604.99074
Minimum1
Maximum239709
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size2.4 MiB
2022-11-22T16:05:06.694507image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q13
median15
Q3113
95-th percentile1983
Maximum239709
Range239708
Interquartile range (IQR)110

Descriptive statistics

Standard deviation4067.9191
Coefficient of variation (CV)6.7239362
Kurtosis726.84752
Mean604.99074
Median Absolute Deviation (MAD)14
Skewness21.357914
Sum1.9232051 × 108
Variance16547966
MonotonicityNot monotonic
2022-11-22T16:05:06.852431image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1 50514
 
15.9%
2 25188
 
7.9%
3 16575
 
5.2%
4 12118
 
3.8%
5 9513
 
3.0%
6 8128
 
2.6%
7 6698
 
2.1%
8 5727
 
1.8%
9 5179
 
1.6%
10 4634
 
1.5%
Other values (10719) 173616
54.6%
ValueCountFrequency (%)
1 50514
15.9%
2 25188
7.9%
3 16575
 
5.2%
4 12118
 
3.8%
5 9513
 
3.0%
6 8128
 
2.6%
7 6698
 
2.1%
8 5727
 
1.8%
9 5179
 
1.6%
10 4634
 
1.5%
ValueCountFrequency (%)
239709 1
< 0.1%
232256 1
< 0.1%
231983 1
< 0.1%
230923 1
< 0.1%
227940 1
< 0.1%
227432 1
< 0.1%
223970 1
< 0.1%
222465 1
< 0.1%
218449 1
< 0.1%
215070 1
< 0.1%

AANTAL_PAT_PER_DIAG
Real number (ℝ)

Distinct8921
Distinct (%)2.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean7681.0088
Minimum1
Maximum227967
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size2.4 MiB
2022-11-22T16:05:07.002222image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile41
Q1407
median1722
Q36359
95-th percentile36582
Maximum227967
Range227966
Interquartile range (IQR)5952

Descriptive statistics

Standard deviation17806.728
Coefficient of variation (CV)2.3182798
Kurtosis34.399128
Mean7681.0088
Median Absolute Deviation (MAD)1566
Skewness5.0786242
Sum2.4417159 × 109
Variance3.1707956 × 108
MonotonicityNot monotonic
2022-11-22T16:05:07.150520image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
21 532
 
0.2%
8 498
 
0.2%
17 475
 
0.1%
9 467
 
0.1%
12 457
 
0.1%
25 453
 
0.1%
26 452
 
0.1%
14 427
 
0.1%
32 421
 
0.1%
11 420
 
0.1%
Other values (8911) 313288
98.6%
ValueCountFrequency (%)
1 363
0.1%
2 408
0.1%
3 391
0.1%
4 401
0.1%
5 365
0.1%
6 403
0.1%
7 378
0.1%
8 498
0.2%
9 467
0.1%
10 376
0.1%
ValueCountFrequency (%)
227967 23
< 0.1%
221635 23
< 0.1%
217854 24
< 0.1%
214511 17
< 0.1%
213535 25
< 0.1%
211593 17
< 0.1%
210434 19
< 0.1%
205348 17
< 0.1%
200603 16
< 0.1%
198527 20
< 0.1%
Distinct9975
Distinct (%)3.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean11050.265
Minimum1
Maximum369837
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size2.4 MiB
2022-11-22T16:05:07.297149image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile52
Q1539
median2384
Q39076
95-th percentile51555
Maximum369837
Range369836
Interquartile range (IQR)8537

Descriptive statistics

Standard deviation26530.735
Coefficient of variation (CV)2.4009139
Kurtosis38.162106
Mean11050.265
Median Absolute Deviation (MAD)2187
Skewness5.3355708
Sum3.5127688 × 109
Variance7.0387991 × 108
MonotonicityNot monotonic
2022-11-22T16:05:07.444965image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
11 403
 
0.1%
17 384
 
0.1%
39 364
 
0.1%
13 363
 
0.1%
25 363
 
0.1%
52 360
 
0.1%
18 357
 
0.1%
23 353
 
0.1%
33 350
 
0.1%
5 349
 
0.1%
Other values (9965) 314244
98.9%
ValueCountFrequency (%)
1 288
0.1%
2 317
0.1%
3 320
0.1%
4 307
0.1%
5 349
0.1%
6 338
0.1%
7 317
0.1%
8 317
0.1%
9 267
0.1%
10 316
0.1%
ValueCountFrequency (%)
369837 23
< 0.1%
348523 25
< 0.1%
347198 23
< 0.1%
343084 24
< 0.1%
341692 19
< 0.1%
323791 20
< 0.1%
315781 17
< 0.1%
310778 17
< 0.1%
298646 17
< 0.1%
289045 16
< 0.1%

AANTAL_PAT_PER_SPC
Real number (ℝ)

Distinct297
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean669808.92
Minimum1376
Maximum1487642
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size2.4 MiB
2022-11-22T16:05:07.608785image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/

Quantile statistics

Minimum1376
5-th percentile42576
Q1287349
median746974
Q31026703
95-th percentile1340856
Maximum1487642
Range1486266
Interquartile range (IQR)739354

Descriptive statistics

Standard deviation413120.06
Coefficient of variation (CV)0.61677301
Kurtosis-1.1197061
Mean669808.92
Median Absolute Deviation (MAD)314548
Skewness0.019103111
Sum2.1292556 × 1011
Variance1.7066818 × 1011
MonotonicityNot monotonic
2022-11-22T16:05:07.788694image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
880942 5102
 
1.6%
874126 4354
 
1.4%
843981 4347
 
1.4%
894337 4333
 
1.4%
880504 4273
 
1.3%
897712 4212
 
1.3%
764815 4088
 
1.3%
776544 3994
 
1.3%
1081598 3890
 
1.2%
1100675 3866
 
1.2%
Other values (287) 275431
86.6%
ValueCountFrequency (%)
1376 117
 
< 0.1%
1610 130
 
< 0.1%
1702 138
 
< 0.1%
1920 131
 
< 0.1%
2255 183
0.1%
2495 173
 
0.1%
6806 380
0.1%
8364 74
 
< 0.1%
11100 366
0.1%
11432 438
0.1%
ValueCountFrequency (%)
1487642 2975
0.9%
1450406 3048
1.0%
1421746 3564
1.1%
1344568 3543
1.1%
1340856 3441
1.1%
1332481 3545
1.1%
1316690 3463
1.1%
1282965 3576
1.1%
1265249 1177
 
0.4%
1262541 1201
 
0.4%

AANTAL_SUBTRAJECT_PER_SPC
Real number (ℝ)

Distinct297
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1079155.5
Minimum1578
Maximum2666528
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size2.4 MiB
2022-11-22T16:05:07.985817image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/

Quantile statistics

Minimum1578
5-th percentile46629
Q1406625
median1078699
Q31728191
95-th percentile2550437
Maximum2666528
Range2664950
Interquartile range (IQR)1321566

Descriptive statistics

Standard deviation739599.49
Coefficient of variation (CV)0.68535023
Kurtosis-0.79326322
Mean1079155.5
Median Absolute Deviation (MAD)649492
Skewness0.37588312
Sum3.4305275 × 1011
Variance5.470074 × 1011
MonotonicityNot monotonic
2022-11-22T16:05:08.167288image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1211792 5102
 
1.6%
1281527 4354
 
1.4%
1216258 4347
 
1.4%
1315603 4333
 
1.4%
1300486 4273
 
1.3%
1341872 4212
 
1.3%
1155341 4088
 
1.3%
1158417 3994
 
1.3%
2550437 3890
 
1.2%
2666528 3866
 
1.2%
Other values (287) 275431
86.6%
ValueCountFrequency (%)
1578 117
 
< 0.1%
1861 130
 
< 0.1%
1962 138
 
< 0.1%
2195 131
 
< 0.1%
2816 173
 
0.1%
3005 183
0.1%
7385 380
0.1%
8889 74
 
< 0.1%
12521 366
0.1%
12982 438
0.1%
ValueCountFrequency (%)
2666528 3866
1.2%
2620514 3787
1.2%
2595737 3844
1.2%
2564210 3781
1.2%
2550437 3890
1.2%
2482063 3851
1.2%
2179417 3757
1.2%
2062495 3810
1.2%
2052308 1168
 
0.4%
1990249 1167
 
0.4%

GEMIDDELDE_VERKOOPPRIJS
Real number (ℝ)

Distinct3494
Distinct (%)1.3%
Missing48964
Missing (%)15.4%
Infinite0
Infinite (%)0.0%
Mean3556.834
Minimum70
Maximum287220
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size2.4 MiB
2022-11-22T16:05:08.344879image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/

Quantile statistics

Minimum70
5-th percentile140
Q1475
median1245
Q34135
95-th percentile13425
Maximum287220
Range287150
Interquartile range (IQR)3660

Descriptive statistics

Standard deviation6515.0129
Coefficient of variation (CV)1.8316888
Kurtosis148.9049
Mean3556.834
Median Absolute Deviation (MAD)1015
Skewness7.2718965
Sum9.5652513 × 108
Variance42445393
MonotonicityNot monotonic
2022-11-22T16:05:08.512724image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
160 1993
 
0.6%
105 1919
 
0.6%
110 1790
 
0.6%
180 1571
 
0.5%
185 1482
 
0.5%
300 1377
 
0.4%
175 1373
 
0.4%
120 1362
 
0.4%
145 1359
 
0.4%
125 1234
 
0.4%
Other values (3484) 253466
79.7%
(Missing) 48964
 
15.4%
ValueCountFrequency (%)
70 226
 
0.1%
75 75
 
< 0.1%
80 362
 
0.1%
85 919
0.3%
90 665
 
0.2%
95 720
 
0.2%
100 920
0.3%
105 1919
0.6%
110 1790
0.6%
115 978
0.3%
ValueCountFrequency (%)
287220 8
< 0.1%
148910 3
 
< 0.1%
142835 4
< 0.1%
122155 4
< 0.1%
116765 3
 
< 0.1%
109725 7
< 0.1%
108570 7
< 0.1%
107655 4
< 0.1%
101270 8
< 0.1%
96880 5
< 0.1%

Interactions

2022-11-22T16:05:01.385342image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-11-22T16:04:48.730964image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-11-22T16:04:50.344522image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-11-22T16:04:52.074373image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-11-22T16:04:53.638817image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-11-22T16:04:55.138533image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-11-22T16:04:56.595840image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-11-22T16:04:58.147026image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-11-22T16:04:59.834824image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-11-22T16:05:01.564930image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-11-22T16:04:48.922137image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-11-22T16:04:50.523943image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-11-22T16:04:52.263828image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-11-22T16:04:53.821608image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-11-22T16:04:55.323396image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-11-22T16:04:56.777671image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-11-22T16:04:58.330088image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-11-22T16:05:00.014515image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-11-22T16:05:01.721617image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-11-22T16:04:49.100417image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-11-22T16:04:50.866996image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-11-22T16:04:52.434608image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-11-22T16:04:53.989763image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-11-22T16:04:55.477220image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-11-22T16:04:56.939914image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-11-22T16:04:58.641284image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-11-22T16:05:00.182709image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-11-22T16:05:01.895561image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-11-22T16:04:49.277734image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-11-22T16:04:51.040486image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-11-22T16:04:52.607762image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-11-22T16:04:54.151490image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-11-22T16:04:55.633932image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-11-22T16:04:57.114300image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-11-22T16:04:58.810248image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-11-22T16:05:00.347750image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-11-22T16:05:02.063552image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-11-22T16:04:49.446370image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-11-22T16:04:51.212029image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-11-22T16:04:52.774502image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-11-22T16:04:54.310922image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-11-22T16:04:55.794612image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-11-22T16:04:57.280314image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-11-22T16:04:58.985357image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-11-22T16:05:00.508315image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-11-22T16:05:02.222410image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-11-22T16:04:49.603256image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-11-22T16:04:51.377165image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-11-22T16:04:52.938225image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-11-22T16:04:54.468085image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-11-22T16:04:55.942611image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-11-22T16:04:57.437956image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-11-22T16:04:59.143300image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-11-22T16:05:00.679774image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-11-22T16:05:02.395087image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-11-22T16:04:49.790834image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-11-22T16:04:51.560250image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-11-22T16:04:53.116779image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-11-22T16:04:54.645373image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-11-22T16:04:56.107057image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-11-22T16:04:57.620803image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-11-22T16:04:59.312659image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-11-22T16:05:00.860652image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-11-22T16:05:02.568330image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-11-22T16:04:49.977423image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-11-22T16:04:51.739124image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-11-22T16:04:53.297000image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-11-22T16:04:54.815291image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-11-22T16:04:56.279881image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-11-22T16:04:57.801538image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-11-22T16:04:59.482571image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-11-22T16:05:01.041182image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-11-22T16:05:02.735892image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-11-22T16:04:50.157115image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-11-22T16:04:51.900657image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-11-22T16:04:53.465537image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-11-22T16:04:54.970762image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-11-22T16:04:56.433645image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-11-22T16:04:57.965742image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-11-22T16:04:59.657094image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-11-22T16:05:01.214530image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/

Correlations

2022-11-22T16:05:08.648249image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/

Auto

The auto setting is an interpretable pairwise column metric of the following mapping:
  • Variable_type-Variable_type : Method, Range
  • Categorical-Categorical : Cramer's V, [0,1]
  • Numerical-Categorical : Cramer's V, [0,1] (using a discretized numerical column)
  • Numerical-Numerical : Spearman's ρ, [-1,1]
The number of bins used in the discretization for the Numerical-Categorical column pair can be changed using config.correlations["auto"].n_bins. The number of bins affects the granularity of the association you wish to measure.

This configuration uses the recommended metric for each pair of columns.
2022-11-22T16:05:09.011347image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2022-11-22T16:05:09.234277image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2022-11-22T16:05:09.455790image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2022-11-22T16:05:09.673484image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

2022-11-22T16:05:03.073204image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
A simple visualization of nullity by column.
2022-11-22T16:05:03.692327image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

VERSIEDATUM_BESTANDPEILDATUMJAARBEHANDELEND_SPECIALISME_CDTYPERENDE_DIAGNOSE_CDZORGPRODUCT_CDAANTAL_PAT_PER_ZPDAANTAL_SUBTRAJECT_PER_ZPDAANTAL_PAT_PER_DIAGAANTAL_SUBTRAJECT_PER_DIAGAANTAL_PAT_PER_SPCAANTAL_SUBTRAJECT_PER_SPCGEMIDDELDE_VERKOOPPRIJS
01.02022-11-212022-11-012018-01-013291499002901044979721982241701345.0
11.02022-11-212022-11-012018-01-01329079900290124094161439150021982241701040.0
21.02022-11-212022-11-012018-01-01329159900290101811891029107521982241701345.0
31.02022-11-212022-11-012018-01-0132909990029011161624252198224170545.0
41.02022-11-212022-11-012018-01-01329109900290122223939921982241701040.0
51.02022-11-212022-11-012018-01-0132914990029011616197972198224170545.0
61.02022-11-212022-11-012018-01-0132910990029002141493992198224170205.0
71.02022-11-212022-11-012018-01-0132912990029012343411412021982241701040.0
81.02022-11-212022-11-012018-01-013290399002901017918384786321982241701345.0
91.02022-11-212022-11-012018-01-0132910990029011495293992198224170545.0
VERSIEDATUM_BESTANDPEILDATUMJAARBEHANDELEND_SPECIALISME_CDTYPERENDE_DIAGNOSE_CDZORGPRODUCT_CDAANTAL_PAT_PER_ZPDAANTAL_SUBTRAJECT_PER_ZPDAANTAL_PAT_PER_DIAGAANTAL_SUBTRAJECT_PER_DIAGAANTAL_PAT_PER_SPCAANTAL_SUBTRAJECT_PER_SPCGEMIDDELDE_VERKOOPPRIJS
3178801.02022-11-212022-11-012016-01-013032939903560592217132455133248118321042235.0
3178811.02022-11-212022-11-012016-01-013032061992990531111841309133248118321044580.0
3178821.02022-11-212022-11-012016-01-01303249199299074116287941332481183210410705.0
3178831.02022-11-212022-11-012012-01-0130343499699054119021012148764219394843505.0
3178841.02022-11-212022-11-012012-01-0130328019929908911111244114318148764219394842990.0
3178851.02022-11-212022-11-012014-01-013135231319992141123726710377012062495340.0
3178861.02022-11-212022-11-012012-01-01303348990003007112384320014876421939484100.0
3178871.02022-11-212022-11-012014-01-013139049900030042232791327010377012062495105.0
3178881.02022-11-212022-11-012014-01-01313919990003004221752190710377012062495105.0
3178891.02022-11-212022-11-012016-01-013034101920010761158069013324811832104NaN